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PsJ , Abstract 

^ ■ This is a paper about private data analysis, in which a trusted curator holding a confidential 

Ql database responds to real vector-valued queries. A common approach to ensuring privacy for the 
database elements is to add appropriately generated random noise to the answers, releasing only 

^-H ' these noisy responses. A line of study initiated in |DN03| examines the amount of distortion 

^^ . needed to prevent privacy violations of various kinds. The results in the literature vary according 

i__i| to several parameters, including the size of the database, the size of the universe from which 

r^, • data elements are drawn, the "amount" of privacy desired, and for the purposes of the current 

r J I work, the arity of the query. In this paper we sharpen and unify these bounds. Our foremost 

• i result combines the techniques of Hardt and Talwar [HTlOj and McGregor et al. IMMP+IO] to 

f_\ ' obtain linear lower bounds on distortion when providing differential privacy for a (contrived) 

'— ^. class of low-sensitivity queries. (A query has low sensitivity if the data of a single individual 

_. 1 ' has small effect on the answer.) Several structural results follow as immediate corollaries: 

^ ' • We separate so-called counting queries from arbitrary low-sensitivity queries, proving the 

^^ . latter requires more noise, or distortion, than does the former; 

y—^ . • We separate (£,0)-differential privacy from its well-studied relaxation (e, (5)-differential 

CN ' privacy, even when S G 2^°^"^ is negligible in the size n of the database, proving the latter 

r — , requires less distortion than the former; 

^^ , • We demonstrate that (e, (5)-differential privacy is much weaker than (e,0)-differential pri- 

, ' vacy in terms of mutual information of the transcript of the mechanism with the database, 

. . I even when S € 2~°(") is negligible in the size n of the database. 

We also simplify the lower bounds on noise for counting queries in |HT10j and also make them 

unconditional. Further, we use a characterization of (e, S) differential privacy from [MMP+lOj 

^ , to obtain lower bounds on the distortion needed to ensure (e, (S)-differential privacy for e,S > 0. 

Next, we revisit the LP decoding argument of |DMT07J and combine it with recent results 
of Rudelson [Rudllj to show that for some specific ry > 0, if the ^-way marginals are released 
such that at least 1 — e fraction of the entries have o{y/n) noise, then a very minimal notion of 
privacy called attribute privacy is violated. This improves on a recent result of Kasiviswanathan 
et al. [KRSUIO] where the same conclusion was shown assuming that all the entries have 0(^/71) 
noise. 

Finally, we extend the original lower bound of [DN03| to prevent blatant non-privacy to the 
case when the universe size is smaller than the size of the database. As we show, the lower 
bound on the noise required to prevent blatant non-privacy becomes larger as the size of the 
universe decreases. 
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1 Introduction 

This is a paper about private data analysis, in which a trusted curator holding a confidential 
database responds to real vector-valued queries. Specifically, we focus on the practice of ensuring 
privacy for the database elements by adding appropriately generated random noise to the answers, 
releasing only these noisy responses. A line of study initiated by Dinur and Nissim examines the 
amount of distortion needed to prevent privacy violations of various kinds |DN03j . Dinur and Nissim 
did not have a definition of privacy; rather, they had a notion that has come to be called blatant 
non-privacy; the modest goal, then, was to add enough distortion to avert blatant non-privacy. 
Since that time, the community has raised the bar by definining (and achieving) powerful and 
comprehensive notions of privacy |DN031 IDMNS061 |DKM"'"06 , and the goal has been to preserve 



(e, 0)-differential privacy and its relaxation, (e, (5)-differential privacy. A final goal considered herein, 
attribute privacy, has a more complicated description, but may be thought of as preventing blatant 
non-privacy for a single data attribute [KRSUIO] in the presence of a certain kind of contingency 
table query. 

The results in the literature vary according to several parameters, including the number n of 
elements in the database, the size d of the universe from which data elements are drawn, the 
"amount" and type of privacy desired, and for the purposes of the current work, the arity k of the 
query. In this paper we strengthen and unify these bounds. 

As corollaries of our work, we obtain several "structural" results regarding different types of privacy 
guarantees: 

• We separate so-called counting queries from arbitrary low- sensitivity queries, proving the 
latter requires more noise, or distortion, than does the former; 

• We separate (e, 0)-differential privacy from its well-studied relaxation (e, (5)-differential pri- 
vacy, even when 6 G 2"°'-"') is negligible in the size n of the database, proving the latter 
requires less distortion than the former; 

• We demonstrate that (e, 5)-differential privacy is much weaker than (e, 0)-differential privacy 
in terms of mutual information of the transcript of the mechanism with the database even 
when 6 G 2~"^^> is negligible in the size n of the database. 

We also simplify the lower bounds on noise for counting queries in |HT10| and also make them 
unconditional removing a technical assumption on the mechanism present in their paper. Next, we 



use a characterization of (e, 5) differential privacy from MMP"'"10 to obtain lower bounds on the 
distortion needed to ensure (e, (5)-differential privacy for e,5 > 0. We remark that |KRSU10] also 
obtain quantitatively similar lower bounds on the distortion required to maintain (e, 6) differential 
privacy for the class of ^-way marginals though their proof technique is very different and arguably 
much more complicated. 

After this, we use results of Rudelson [Rudll| and combine it with LP decoding to show that 
attribute privacy is violated if i-way marginals are released with at least 1 — r] fraction of these 
marginals are released with o{\/n) noise for some ry > 0. The results and the technique in [KRSUlO] 
required 77 = making our results more powerful. Finally, we extend the results of |DN03j to the 
case of small universe size achieving stronger lower bounds to prevent blatant non-privacy. 

To describe our results even at a high level we must outline the privacy-preserving database model, 
the notion of distortion or noise that may be employed in order to preserve privacy, and the meaning 



of the goals of the adversary: blatant non-privacy, violation of (e, 0)-differential privacy, violation 
of (e, 6)- differential privacy, and attribute non-privacy. 

Typically, the curator of a database receives questions to which it responds with potentially noisy 
answers. There are two possible settings here. One is that the queries are received by the curator 
one at a time. The other situation is that all the queries are received by the curator at once and it 
then publishes (noisy) answers to all of them at once. The former is called the interactive setting 
and the latter is called the non-interactive setting. All our lower bounds are in the non-interactive 
setting making them applicable to the interactive setting as well. 

We now formally describe a database and a query : A database X is an element of (Z^) . Here d 
is called the universe size and intuitively refers to the number of types of elements present in the 
database. Also, for a database X, n = Yli=i -^i ^^ defined as the size of the database and refers to 
the number of elements in the database. Note that we are representing databases as histograms. A 
query (of arity k) is a map F : {Z+Y ^ ^^ such that Vi G [k], \fx,y G (Z+)'^, \F{x + y)i-F{x)i\ < 1 
if ||y||i = 1. In other words, every coordinate of the map F is 1-Lipschitz. We say -F is a counting 
query if i^ is a linear map. The meaning of d, k, n throughout the paper shall be the same as above 
unless mentioned otherwise. 

We now formally introduce the definition of mechanism and privacy. 

Definition 1.1 Let T be a family of queries such that \/F & F , F : (Z+)'^ — )• M'^. Then, a 
mechanism M : (Z"*") x J^ — ;■ ^(M ) where /i(M ) is simply the set of probability distributions over 
M'^. On being given a query F ^ F and a database x G (Z"*")*^, the curator samples z from the 
probability distribution AI{x,F) and returns z. 

We next state the definition of e-differential privacy (introduced by Dwork et al. in |DMNS06] ) and 
(e, (5)-differential privacy (introduced by Dwork et al. in [DKM^OG] ). 

Definition 1.2 For a family of queries T , a mechanism M : (Z+) x J-" — )■ ^(M ) is said to be 
e- differentially private if for every x, y G (Z"^)'^ such that ||x — y||i < 1, every measurable set S C M'^ 
and VF G F, the following holds : Let M(x, F) = M^^f CLnd M{y, F) = My^p and for a probability 
distribution T, let V[S) denote the probability of set S under T. Then, 

- My AS) - 

The mechanism is said to be {e, 6) -differentially private if 

2-' ■ MyAS) -6< M^^f{S) < 2' ■ MyAS) + 5 

Typically, 6 is set to be negligible in n,k. 

We remark that we do not define the notion of noise very precisely here as the notion of noise 
depends on the context. However, in the context of differential privacy, we use the following 
definition of noise. 

Definition 1.3 For a family of queries F , a mechanism M : (Z^) x F ^ ^(M ) is said to 
add noise (at most) r] if with high probability (say 0.99J over the randomness of M, \\M{x,F) — 
F{x)\\oo<V- 



While differential privacy is a very strong notion of privacy, sometimes one can show that even 
very modest definitions of privacy get violated. One such notion is that of blatant non-privacy. We 
say that a mechanism M for answering F over databases of size n and universe size d is blatantly 
non-private, if there is an attack A such that w.h.p. over the answer y returned by the mechanism 
M, A{y) differs from the database only at o(l) fraction of the places. Yet another very weak notion 
of privacy that is interesting to us is that of attribute non-privacy. The formal definition follows : 

Definition 1.4 For a query F £ F, a mechanism M : ({0, 1} )" x F ^ M is said to be attribute 
non-private if there exists Y £ ({0, l}'^-!)" and an algorithm A such that for every x € {0, 1}", 

P [A{z) = x' : \\x - x'lli = o(||x||i)] > 1/10 

zeM(Yox,F) 

where Y ox simply denotes the obvious concatenation ofY and x. A need not be computationally 
efficient and the constant 1/10 is arbitrary and can be replaced by any positive constant. 

We show the following results : 



Combining techniques from [HTlOj and [MMP"*"]!) , we obtain tight lower bounds on the noise 



for arbitrary (non-counting) low-sensitivity queries for any (e, 0)-differentially private mecha- 
nism. Given positive results of Blum, Ligett, and Roth [BLR08J, this separates non-counting 
queries from counting queries, proving that the former require more distortion than the latter 
for maintaining differential privacy. Also, given the positive results of DKM"'"06] for arbi- 



trary low-sensitivity queries, this separates (e, (5)-differential privacy from (e, 0)-differential 
privacy, where 6 = 6{n, k) denotes a function negligible in its argument. We also use this 
technique to show that the guarantees in terms of information content is drastically weaker 
for an (e, 5) differentially private protocol as compared to an e-differentially private protocol. 
Our technique also simplifies the volume-based lower bounds on noise for counting queries 
in [HTlOj . In addition, we also make the lower bounds unconditional. The lower bound 
in [HT10| required the mechanism to be defined on "fractional" databases i.e., on (M"^)"^ as 
opposed to just [Tj^) while we do not have any such restrictions. 

2. We give tight lower bounds on noise for ensuring (e, 5)-differential privacy for 5 > 0. This proof 
relies on a lemma due to [MMP"'"10 showing that (e, (5)-differentially private mechanisms yield 



a certain kind of unpredictable source. On the other hand, any mechanism that is blatantly 
non-private cannot yield an unpredictable source. Thus, if the noise is insufficient to prevent 
blatant non-privacy then it cannot provide (e, (5)-differential privacy. We subsequently use 
the lower bounds of [DN03t IDMT07] for preventing blatant non-privacy to get lower bounds 
on the distortion for (e, 6) differential privacy. 

3. We revisit the LP decoding attack of Dwork, McSherry, and Talwar fP MTOT] . observing that 
any linear query matrix yielding a Euclidean section suffices for the attack. The LP decoding 
attack succeeds even if a certain constant fraction of the responses have wild noise. Armed 
with the connection to Euclidean sections, and a recent result of Rudelson [Rudllj bound- 
ing from below the least singular value of the Hadamard product of certain i.i.d. matrices, 
we qualitatively strengthen a lower bound of Kasiviswanathan, Rudelson, Smith, and Ull- 
man |KRSU10] on the noise needed to avert attribute non-privacy in £-way marginals release 
by making the attack resilient to a constant fraction of wild responses. 

4. We show new lower bounds for blatant non-privacy for the case when the size of the universe 
is smaller than the size of the database. In particular, we show that there is a counting 



query such that if the distortion added on at least 1/2 + r/ fraction of the answers is bounded 
by o{n/vd) (for some ry > 0), then there is an attack which recovers a database different 
from the original database by o{n). Our analysis makes use of a result on large deviation of 
Rademacher sums. 



2 Lower bound by volume arguments 

We now recall the volume based argument of Hardt and Talwar |HT10] to show lower bounds on 
the noise required for e differential privacy. 

Theorem 2.1 Assume xi, . . . , ^2= G {'^~^) such that Vi, ||a^i||i < n and for i ^ j, ||xj — Xj\\i < A. 
Further, let F : {Z+f -^ R^ such that for any i / j, \\F{xi) - F{xj)\\oo > V- -(/ A < (s - l)/e, 
then any mechanism which is e- differentially private for the query F on databases of size n must 
add noise rj/2. 

While the line of reasoning in the proof is same as that of [HTlOj . we do the proof here as the 
argument in |HT10] works only for counting queries i.e., when i^ is a linear transformation. On 
the other hand, the statement and proof of our result works for any query F. 

Proof: Consider the ioo balls of radius r]/2 around each of the F(xi). By the hypothesis, these 
balls are disjoint. Now assume, any mechanism M which adds noise rj/2 and consider any Xi. Then, 
because all the balls are disjoint, we have that there is some j ^ i such that if S is the loo ball of 
radius rj/2 around F(xj), then 

p [zeS]< 2-' 

z&M{xi,F) 

However, we can also say that because the noise added by the mechanism M is at most rj, 

P [2 e S] > 1/2 

z&M{xj,F) 

Also, because the mechanism M is e-differentially private and \\xi — Xj\\i < A, then 

PzeMix„F)[z £ S] ^ ^_^.^ 

^zeM{xj,F)[z ^ S] ~ 
This leads to a contradiction if A < (s — l)/e thus proving the assertion. ■ 

2.1 Linear lower bound for arbitrary queries 

In this subsection, we prove the following theorem. 

Theorem 2.2 For any k,d,n S N and 1/40 > e > 0, where n > in.m{k/e,d/e}, there is a 
query F : (Z+)'^ — )■ R^ such that any mechanism M which is e-differentially private adds noise 
U{niin{d/e, k/e}). 

If e > 1, then there is a query F : (Z"*") — t- M such that any mechanism M which is e-differentially 
private adds noise n(min{d/(e • 2^''), k/e}) as long as n > inm{k/e,d/{e ■ 2^*^)} 



Before starting the proof, we make a couple of observations. First of all, note that the statement 
of the theorem does not give any lower bound for 1 > e > 1/40. However, any mechanism which is 
e-differentially private for e in the aforementioned range is also e'-differentially private for e' = 10/9. 
Hence, the noise lower bounds for e'-differential privacy for e' = 10/9 are also applicable for the 
range of 1 > e > 1/40. It is easy to see that up to constant factors, the lower bounds with e' = 10/9 
are optimal for e in the aforementioned range. 

Secondly, we note that it is enough to add noise 0{k/e) to maintain e-differential privacy (using 
the Laplacian mechanism). Also, because the databases are of size n, it is enough to add noise 
0{n) to maintain e-differential privacy for any e > 0. Thus, as long as A; = 0{d), our lower bounds 
are tight up to constant factors. Next, we do the proof of Theorem | 



Proof: Our proof strategy is to construct a set of databases and a query which meets the 
conditions stated in the hypothesis of Theorem 12.11 and then get the desired lower bound on the 
noise. We first deal with the case when < e < 1/40. Let £ = minjd, k}. We can now use Claim [Al2] 
to construct 2* databases xi, . . . ,X2'> (for s = £/400) such that Xi G (Z^)*^ with the property that 
Vi ^ j, \\xi — XjWi > n'/lO and ||a;i||i < n' where n' = i/{1280e) (Application of Claim \A^ uses 
d' = £/320). Note that our databases are of size bounded by n' < n. We now describe a mapping 
C : {Z'^Y ~^ ^^^° which is related to a construction in [MMP"*" 10 . The mapping is as follows : 



• For every Xj, there is a coordinate i in the mapping. 

• The i coordinate of C{z) is max{n'/30 — ||xj — z||i,0}. 

Claim 2.3 The map C is 1-Lipschitz i.e., if \\zi — Z2\\i = 1, then ||£(2;i) — £(z2)||i < 1- 

Proof: We observe that for any zi, Z2 such that ||zi — Z2\\ < 1, if A denotes the set of coordinates 
where at least one of C{zi) or C{z2) are non-zero, then A is either empty or is a singleton set. 
Given this, the statement in the claim is obvious, since the mapping corresponding to any particular 
coordinate is clearly 1-Lipschitz. H 

We now describe the queries. Corresponding to any r G {—1, 1}^ , we define fr : (Z^) — )• M, as 

d 

fr{x) = ^C{x)i -n 

i=l 

Now, we define a random map F : (Z^) — )• M as follows. Pick ri, . . . , r^ G { — 1, 1}^ independently 
and uniformly at random and define F as follows : 

F(x) = (^(x),...,/,,(x)) 

Now consider any x/j,Xj G S such that h ^ j. Because of the way C is defined, it is clear that for 
any r^, 

nfn{xh)-fn{xj)\>n'/15]>l/2 

ri 

A basic application of the Chernoff bound implies that 

P [For at least 1/10 of the r^'s, |/r.,(x/,) - /,.,(xj)| > n/15] > 1 - 2"'=/3° 



Now, note that the total number of pairs {xi,Xj) of databases such that Xi,Xj G 5 is at most 
22s < 2^/200 < 2^/200, This imphes (via a union bound) 

P [V/i / j, For at least 1/10 of the n's, \frAxh) - /n(xj)| > n/15] > 1 - 2"''/^° 

This implies that we can fix ri, . . . , r^ such that the following is true. 

V/i 7^ j, For at least 1/10 of the r^'s, Ifnixh) — fri{xj)\ > n /lb 

This implies that for any Xh / Xj G 5, \\F[xh) — F{xj)\\oo > n' /lb. In fact, ||F(x/j) — F{xj)\\2 > 
n'vfc/150 which is a much stronger assumption than what we require and is quantitatively similar 
to the results in |HT10) where they consider £2 noise as opposed to ^00 noise. 

We can now apply Theorem 12.11 bv putting A = 2n' and s = ^/400 > Sen' and 77 = n'/lb and 
observe that A < (s — l)/e thus proving the result. 

We next deal with the case when e > 1. This part of the proof differs from the case when e < 1 
only in the construction of xi, . . . ,X2s. We also emphasize that had we not insisted on integral 
databases, our proof would have been identical to the first part. We construct the databases 
xi,. . . ,X2'> using combinatorial designs. More precisely, for some sufficiently large constant C, let 
i = min{(i/(C • 2^'^),/c}. We can now use Claim lA^ to construct 2** databases xi, . . . ,X2'> (for 
s = £/400) such that Xj G (Z+)'^ with the property that Vi / j, \\xi — Xj\\i > n'/lO and ||xj||i < n' 
where n' = £/(1280e) (using d' = 1/320 in Claim [A?3l) . Again, we note here that the databases 
constructed are of size n' . 

From this point onwards, we define the map C and the query F as we did in the proof of Theorem l2.2l 
and the proof proceeds identically. In particular, we get a query F : (Z"'')'^ — )• R'^ such that for 
any i ^ j, \\F{xi) — F[xj)\\2 > n'\/k/lbO. As before, we can now apply Theorem 12.11 by putting 
A = 2n' and s = i/100 > Sen' and t] = n'/lb and observe that A < (s — l)/e thus proving the 
result H 

For the subsequent part of this paper, we only consider lower bounds on e-differential privacy for < 
e < 1 as opposed to e > 1. This is because the privacy guarantees one gets becomes unmeaningful 
when e is large. However, we do remark that the results can be carried in a straightforward way to 
the regime of e > 1 using combinatorial designs (like we did for Theorem 12. 2p . 

Consequences of the linear louver bound 

We briefly describe the two consequences of the linear lower bound on the noise proven in Theo- 
rem 12.21 The first is separation of counting queries from non-counting queries. While our sepa- 
ration gives quantitatively the same results as long as d = k^^^' and n = @{k/e), for simplicity, 
we consider the setting when k = d and n = k/e. In this case. Theorem 12.21 shows existence of 
a (non-counting) query such that maintaining e-differential privacy requires noise n{n). On the 
other hand, |BLR08| had proven that for any counting query with the same setting of parameters, 
there is a mechanism which adds noise 0{n^''^) and maintains e-differential privacy. This shows 
that maintaining e-differential privacy inherently requires more distortion in case of non-counting 
queries than counting queries. 

The next consequence is a separation of (e, 6) differential privacy from (e, 0) differential privacy for 
6 = 2~°^'^' . We note that Hardt and Talwar |HT10j had shown such a separation but that was 
only when k = O(logn) and 6 = n^'-^^'^' . Again, we use the setting of parameters when k = d and 

6 



n = k/e. The gaussian mechanism of [D KM"'"06] shows that to maintain (e, 5) differential privacy 
for any k queries, it sufficies to add noise 0{^k\og{l/5)/e) = o{n). However, Theorem 12.21 shows 
that there is a query which requires adding noise Q,{n) to maintain (e,0) differential privacy. 

The last consequence of our result is more indirect and is explained next. 
2.2 Information loss in differentially private protocols 



In |MMP^10 , a connection was established between differentially private protocols and the notion 



of mutual information from information theory. In fact, as JMMP^IO] was dealing with 2-party 
protocols, the connection was actually between differentially private protocols and that of infor- 
mation content jBYJKS04t IBBCRIO] which is a symmetric variant of mutual information useful in 
2-party protocols. In that paper, it was shown that the information content (which simplifies to 
mutual information in our setting) between transcript of a e-differentially private mechanism and 
the database vector is bounded by 0{en). Using the construction used in the previous subsection, 
we show that in case of {e,5) differentially private protocols (for any 5 = 2"°'"^), there is no non- 
trivial bound on the mutual information between the transcript of the mechanism and the database 
vector. Thus as far as information theoretic guarantees go, the situation is drastically different for 
pure differentially private protocols vis-a-vis approximately differentially private protocols. The 
contents of this subsection are a result of personal communication between the author and Salil 
Vadhan |DV10] . 

We first define the notion of mutual information (can be found in standard information theory 
textbooks). 

Definition 2.4 Given two random variables X and Y , their mutual information I{X; Y) is defined 
as 

I{X; Y) = H{X) + H{Y) - H{X, Y) = H{X) - H{X\Y) 

where H{X) denotes the Shannon entropy of X. 

The next claim establishes an upper bound on the mutual information between transcript of a 
differentially private protocol and the database vector. 

Claim 2.5 Let F : (Z^)"^ — )• M'^ 6e a query and M : (Z+)'^ — )• /i(M'^) he an e-differentially private 
protocol for answering F for databases of size n. If X is a distribution over the inputs in (Z+)'^, 
thenI{M{X);X) < Sen. 

Proof: We first note that since the databases are of size bounded by n, hence instead of assuming 
that ^ is a distribution over the inputs X E (Z"^) , we can assume that /i is a distribution over 
the inputs X G [nY" where [n] = {0, 1, . . . ,n}. Now, we can apply Proposition 7 from MMP"*" 10] . 



We note that the aforesaid proposition is in terms of information content for 2-party protocols 
but we observe that we can simply make the second party's input as a constant and get that 
/(M(X);X) <3en. ■ 

Next, we state the following claim which says that for (e, 5) differentially private protocols, even 
for an exponentially small (5, the mutual information between the transcript and the input can be 
as large as n(l — ry) for any value of < e, 77 < 1. In other words, an (e,(5) differentially private 
protocol does not imply any effective bound on the mutual information between the input and the 
transcript even as e — )• and 5 is exponentially small. 

7 



Lemma 2.6 For n G N and < e, r/ < 1, there is a constant C = C{e,r]) > and a distribution 
X over (Z+)" with a support over databases of size n and a query F : (Z"*")" — )• M and an {e,6)- 
differentially private protocol M for answering F such that I{X; M{X)) > n(l — 2r/) if 6 > 2^^^'"''^'"'. 

Proof: We first construct 2" vectors in {0, 1}" (for s = n{l — rf)) witli tlie property that for any 
Xi,Xj {i 7^ j), \\xi — XjWi > rj'^n/8. It is easy to guarantee the existence of such a set of vectors 
by a simple apphcation of the probabilistic method. The distribution X is simply the uniform 
distribution over the set {xi, . . . , X20}. By construction, all the databases in X are of size bounded 
by n. 

Next, we define the query F : (Z^)" — )• M be defined in the same way as the query F in the proof 
of Theorem 12.21 Following, exactly the same calculations, we can show that if we set k = 80n, we 
get a query F : (Z"*")" — )• M'^ such that for any i ^ j, \\F{xi) — F{xj)\\2 > rf'n\/k/bQ. We now recall 



the Gaussian mechanism of DKM^OG which maintains (e, 5) differential privacy. 



Lemma 2.7 \DKM+ Offl Let F : {Z+Y ^ M'= 6e a query. Let Y = {Yi, . . . , Yfc) be a distribution 
over M*^ such that each Yi is an i.i.d. AA(0, o") random variable. Here cr^ = — '^^\ ' ' . Then the 
mechanism M which for a database x and query F , which samples Yq from Y and responds by 
F(x) +Yq is an (e, 6) differentially private mechanism. 

Note that for the above mechanism M, and database x, if Z is sampled from M(x), then the 
distribution of M(x) — F{x) is same as {Yi, . . . ,Yk) where each Yi is an i.i.d. AA(0,(t) random 
variable. Thus, 

\\M{x)-F{x)\\l^Yl + ... + Yi 

As the following fact shows, the distribution on the right hand side is concentrated around its 
mean. The fact is possibly well-known but we could not find a reference and hence we prove it in 
Appendix [Cl 

Fact 2.8 IfYi,...,Yk are i.i.d. M{0,a) random variables, then, 



P [Y^ + ...+Y^>2{l + 0-k-a^]<2-'^ 

^1 vi^fc 



Using the above fact, we get 



llM,.)-^,.,ll^>?<i±i)^^MlZ^ 



e^ 



< 2^- 



Here the probability is over the randomness of the mechanism. Putting ^ = 1 and 6 = 2 '-"('^'^)"' for 
an appropriate constant C{e,r]), we get that 



||M(x)-F(x)||2>'''' 



^^-Vk 



200 



< 2"^°'^ 



As we know, for any i 7^ j, \\F{xi) — F{xj)\\2 > rfny/k/b^. Hence, with probability at least 1 — 2 
over the randomness of the mechanism, for any database Xj G supp{X), if y is sampled from M{xi 

Vj / i \\F{xj) -y\\2> \\F{xi) -y\\2 



Thus, for any Xj, given M{xi), we can recover re, with high probabihty and hence, we can say 

P [H{X\M{X) = y) = 0] > 1 - 2-" 

j,~M(X) 

This means that 

H{X\M{X)) < 2-"n < 1 

Recall that I{X;M{X)) = H{X) - H{X\M{X)) > H{X) - 1 = (1 - ??)n - 1 > (1 - 2??)n. This 
completes the proof of the Lemma 12.61 H 

3 Lower bound on noise for counting queries 

In the last section, we proved that to preserve e differential privacy for k queries, one may need 
to add r2(fe/e) noise provided d,n ^ k. However, these queries were not counting queries. It is 
interesting to derive lower bounds on noise required to preserve privacy for counting queries as 
these are the queries mostly used in practice. While one might initially hope to prove a similar 
lower bound for counting queries, [BLR08J states that there is a e-differentially private mechanism 
which adds 0{n?'^/e) noise per query and can answer 0{n) counting queries (when d = n^^'). 

Still, Hardt and Talwar JHTIO] showed that to answer k counting queries, any mechanism which 
is e-differentially private must add i^(iiiin{k/e, y^klog{d/k)/e}) noise (in fact, this is true for k 
random queries). However, [HTIO] make a technical assumption that the mechanism has a smooth 
extension which works for "fractional" databases as well. In other words, they require the domain 
of the mechanism to be (M"'")'^ as opposed to (Z"'")'^. However, it is not clear if this is always true i.e., 
if given a mechanism which is defined only over true (integral) databases, one can get a mechanism 
which is defined over "fractional" databases with similar privacy guarantees. 

Next, we prove the same result without making any such technical assumptions. Again, our con- 
structions are dependent on combinatorial designs [Pau85| . First, we prove the following simple 
but useful claim. 

Claim 3.1 Let a G Z and assume j;i, j;2) • • • ,X2s G [Z,'^)'^ such that Vi, every entry of Xi is either 
or a. Also, for every i ^ i, \\xi — x^||i > A. Then, for k > 20s, there is a linear query 
F : (Z+)°' — )■ R^ such that for every i,£ G [2^] and i y^ £, the following holds : 

P [\F{xi)j - F{xe)j\ > A'/IO] > 1/40 



where A' = \/A • 



a. 



Proof: Consider any Xi,Xi such that i ^ i. Note that, z defined as z = Xi — X£ is such that 
all its entries are 0, ita and also that z has at least A/a or more non-zero entries. If we choose 
r G { — 1, 1} u.a.r., then note that 



d 

i=l Zi=±a 



y = E^^-r.= E 



Note that the total number of summands is i' > A/a and hence the distribution of the random 
variable Y is same as choosing r' G {—1, 1}"^ and considering the random variable 



'"-{P'^ 



However using Corollary IB.21 we get 



\Y'\ > 



VA-a 
10 






- 10 



(1) 



Now, let us choose r^, . . . ,r^ uniformly and independently at random from { — 1, 1}^ and consider 
the linear query F : (Z"'")'^ — )• M.^ defined as 



Set A' = VA • a. Now, ([T]) and an application of Chernoff bound implies that for any Xj, x^ (i 7^ 



1' ' fc 



i6[fc] 



|F(xi)i - ^(^^)jl > ^710] > 1/40 



> 1 - 2-'=/io 



We now observe that the total number of pairs (xj,^^) {i ^ £) is at most 2-^'* < 2^'^^. Applying a 
union bound, we get that there is some choice of r^, . . . , r^ (and hence a fixed F) such that 



iG[fc] 



|F(x,)j-F(x^)j|> A7l0]>l/40 



We now prove a lower bound on the noise required to maintain privacy for random counting 
queries. As we have said before, Hardt and Talwar [HTlO j proved the same result under an addi- 
tional assumption that the mechanism defined over integral databases can be smoothly extended 
to fractional databases as well. 

Theorem 3.2 For every k,d Gf^ and 1 > e > 0, there is a counting query F : (Z+)'^ — t- M.^ such 
that any mechanism which maintains e- differential privacy adds noise Q{inm{k/e, y'klog{d/k)/e}). 
The size of the database i.e., n = 0{k/e). 



Proof: The proof strategy is to come up with databases meeting the hypothesis of Claim 13.11 
and use Claim 13.11 to get a counting query F. We then use Theorem 12.11 to get a lower bound on 
the distortion required by any private mechanism to answer F. We consider two cases : k < log d 
and k > logd. 

The first case is trivial : Namely, consider databases xi, . . . , X2k/2o such that each Xj = L(^/80e)J • Cj 
where Ci is the standard unit vector in the i direction. This is possible as there are d>2 different 
unit vectors. Note that for any i / ^, ||xj — x^ ||i = 2 • [A;/(80e)J . We can now apply Claim [3TT] and 
get that there is a linear query F : {if^Y -> M^ (using A = 2 • [k/{SQe)\ and a = L^/(80e)J) such 
that 



ieW 



\/2 k 
|F(x,-),- - F(xAA > -^ I k/iSOe) I > 



> 1/40 



We see that there are 2^^"^^ = 2* databases which differ by exactly 2 • [A;/(80e)J = A. Note that 
A < (s — l)/e. Hence we can apply Theorem 12. II to note that to maintain e-differential privacy, any 



10 



mechanism needs to add fc/(800e) noise. In fact, we note that the ^2 error of the answer returned 
by the mechanism needs to be Q.{k'^''^ /e) which is quantitatively the same as the result in [HTlOj . 

The second case is slightly more complicated. We use Claim [A?T] to construct xi, . . . , X2fe/2o G {I/^Y 
with the following properties : 

• Every entry of any of the Xj's is either or a G Z such that a > (log((i/A;)/160e). 

• Vz, ||xj||i < k/80e and Vi ^ j, \\xi — Xj\\i > A;/160e 

Again, we can apply Claim 13.11 and get that there is a linear query F : (Z"*") — )• M (using 
A > k/{l60e) and a > {log{d/k)/l60e)) such that Vi / ^ 



je[fc] 



|F(..),-FM,|>1 V^^^S3W 



10 160e 



> 1/40 



Again, we have 2 '"^^ databases which differ by at most A;/(40e) and hence we can apply Theorem 12. II 
to get that to maintain e-differential privacy, any mechanism needs to add Q, I — ) noise. 



4 Lower bounds for approximate differential privacy 

In this section, we prove lower bounds on the noise required to maintain (e, 6) differential privacy 
for €,5 > 0. Our lower bounds are valid for any positive 5 > and are in fact tight for a constant 
e and 6. We note that a quantitatively similar lower bound was proven for the class of i-way 
marginals by |KRSU10] though our proof (for random queries) is arguably much simpler. 

In this section, we consider databases which are elements of {0, 1}" or in other words we consider 
the case when the universe size d = n and the databases are allowed to have exactly one element of 
each type. We note that restricting databases to bit vectors is a well-considered model in literature 
including |DN031 IDMT07I IMMP+IO] among others. 



We prove the following theorem. 

Theorem 4.1 For any n G N, e > and 1/20 > 5 > 0, there exist positive constants a, 7 and rj 
such that there is a counting query F : {0, 1}" — )■ M with k = an such that any mechanism M that 
satisfies 

P[ P [\M{x,F)i-F{x)^\ <r]^/^ > 1/2 + 7] >3V^ 

is not (e, 6) differentially private. In other words, any mechanism M which with significant prob- 
ability i.e., 3\/5 answers at least 1/2 + 7 fraction of the k queries with at most rjy/n noise, is not 
(e, 5) differentially private. 

An immediate corollary is that there exists a positive constant a and a counting query F : {0, 1}" — )• 
M'^ where k = an such that any mechanism which adds o{^/n) noise is not (e, 5) differentially private 
for e > and 5 < 1/20. 

To do the proof of Theorem 14.11 we first need to introduce some definitions previously discussed 
in |MMP+10] . We do note that the paper [MMP+IOJ deals with the two-party setting but the 
relevant definitions and the lemma we use here easily extend to the standard (curator-client) setting 
of privacy. 
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Definition 4.2 A random variable Y £ {0, 1}" is said to be 5-approximate strongly a-unpredictable 
bit source (for a > 1) if with probability 1 — 5 over i G [n] and (yi, . . . , yi-i,yi, Vi+i, • • • , yn) <— Y 

1 , F[Yi = l\Yi = 2/1, ... , Yi_i = yi_i,Yi+i = yj+i, . . . , y„ = y„] 

— — — F i T — '^ 

a r[Yi = 0| Yi =yi,..., Yi_i = yj_i, y^+i = yi+i, . . . , y„ = y„J 



The next lemma (proven in [MMP^IO for the two-party setting) roughly says that for any (e, S) 
private mechanism, conditioned on the transcript of the mechanism, the distribution of the database 
is a (5-approximate strong 2^-unpredictable source. More precisely, we have the following lemma. 

Lemma 4.3 Let F : {0, 1}" -^M.^ be a query and M be a (e, 5) -differentially private mechanism for 
answering F . Let X be the uniform distribution over {0, 1}" and T be the probability distribution 
over the transcripts of M{x) when x is drawn from X . Then for any fi > and t -^ F, the 
distribution X|r=t is 5t approximate strongly 2'^'^'^ -unpredictable sources such that 

E [6t]<26- ^ 
ter 1 — e ^ 



The above lemma trivially follows from Lemma 20 of [MMP^IOJ (full version) and hence we do not 
prove it here. Before, proving Theorem 14. H we need to recall the following theorem from |DMT07] 
(Theorem 24 in the paper). 

Theorem 4.4 For any 7 > and any v = v{n), there is a constant a = a{'^) > such that for 
k = an, there is a counting query F : {0, 1}" — )• M'^ and an algorithm A such that given y which 
satisfies 

P [\yi-Fix)i\ <i^]>-+j 
ie[k] 2 

the output of A on y i.e., A{ii) = x' such that x' € {0, 1}" and \\x — x'\\i < -^ 
The following corollary follows immediately from Theorem [ 



Corollary 4.5 For any 6' > 0, there are positive constants 7 = j{6'),r] = rj{5'),a = a{5') such 
that for k = an, there is a counting query F : {0, 1}"" — )■ M'^ and an algorithm A such that given y 
which satisfies 

P [\yi-F{x)i\ <7?V^] > - + 7 
ie[fc] 2 

the output of A on y i.e., A{y) = x' such that x' G {0, 1}" and \\x — x'\\i < 5'n. 
We now prove Theorem 14.11 

Proof: [of Theorem 14. Ij 

Let X denote the uniform distribution over {0, 1}"". First, using Lemma 14.31 we get that over the 
randomness of the mechanism M and the choice of x G X, if we sample a transcript t from M(x, F), 
then for any positive //, the distribution -'^|M{a;,F)=i is a 5t-approximate strongly 2'^+'^-unpredictable 
sources where 5t satisfies 

E [6t]<25- 7 _^ 
teM{x,F) 1 - e ^ 
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. Clearly, we can put /i = 10 and get that the distribution -^|M(x,F)=i is ^ (5t-approxiinate strongly 
2'^+-^''-unpredictable sources where E,t&M(x,F) [^t] < 35. By an application of Markov's inequality, 
we get that with probability 1 — 2y/5 over the choice of x and the randomness of the mechanism 
M, the distribution -^^1^/(0;,^)=* is 2vo-approximate strongly 2'^"^^°-unpredictable source. 

We now apply corollary 14. 5i In particular, we put 6' = v6 and get that for some positive 7, r/, a 
(which are functions of 5' and hence 6), there is a counting query F : {0,1}" — )• M"" and an 
algorithm A such that given y which satisfies 

P [\yi-F{x)i\ <7],/E] > -+7 
ie[k] ^ 

the output of A on y i.e., A{y) = x' such that x' G {0, l}" and \\x — x'||i < v6 ■ n. Now, consider 
a mechanism M which satisfies 

P [ P [|M(x, F)i - F{x)i\ < r]V^ > 1/2 + 7] > /3 

M ie[k] 

for (3 = 3y6. Clearly such a mechanism M is not (e, 6) differentially private because with probability 
at least /3 = 3v6, the algorithm A will be able to predict at least 1 — vo fraction of the positions 
which contradicts that with probability 1 — 2\/^, the distribution ^|M(a;,F)=t is a 2-v/^ -approximate 
strongly 2'^+^°-unpredictable source. ■ 

5 LP decoding, Euclidean sections and hardness of releasing i-way 
marginals 

In this section, we consider attacks on privacy using linear programming. In particular, we use the 
technique of LP decoding (previously used in [DMT07] in context of privacy) to give attacks which 
violate even minimal notions of privacy when 1 — eo (for some eq > 0) fraction of the queries are 
released with insufficient noise. We do this by establishing a connection between Euclidean sections 
and use of LP decoding in context of privacy which does not seem to have explicitly appeared in the 
literature before. We remark that the relation between LP decoding and Euclidean spaces is very 
well known in context of compressed sensing |CRTV05j . However, in case of privacy, the adversary 
is allowed to add small error to say 99% of the entries and arbitrary error to the remaining 1% 
of the entries. In context of compressed sensing however, the adversary is allowed to add error to 
only 1% of the entries. 

We first describe how to use linear programming in context of privacy. Assume x € Z+ is a 
database and A : M — )• M is a linear map which represents a counting query with arity k made 
on the database x. Further, the right set of answers is given hy y = A ■ x. (To make sure that 
the queries are 1-Lipschitz, all the entries of A come from [—1, 1].) Suppose, y G M is the answer 
returned by the mechanism. Then, consider the following optimization problem (which can be 
written as a linear program) : 

Minimize ||y — y||i subject to y = A ■ x (2) 

We would like to understand the conditions such that the solution to the above linear program, call 
it X, is such that ||x — 5;||i is small. To state our main theorem, we need to first define a Euclidean 
section and list some of its basic properties. 
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Definition 5.1 y C R'^ is said to be a (6,d,k) euclidean section if V is a linear subspace of 
dimension d and for every x €z V, the following holds: 

yk\\x\\2 > \\x\\i > 6Vk\\x\\2 

A linear operator A : W^ ^ MJ' is said to be 6-Euclidean if the range of A is a euclidean {6, d, k) 
section. We also note that because of Jensen's inequality, \'k\\x\\2 > \\x\\i holds trivially. 

We remark that when we say a subspace ^ C M'^ is Euclidean, we simply mean that there is some 
constant 5 > such that A is {6,diin{A), k) Euclidean. The next claim states a useful fact about 
Euclidean sections. We also introduce the following notation. For any £p norm defined on M*^ and 
any S C [k] and a; G M , ||x||5_p denotes the £p norm of the vector xs where xs is the projection of 
X on the subset S. 

Claim 5.2 Let V Q R^ be a {5,d,k) Euclidean section. Then for every < 6' < 5"^ /A, S C [k] 
such that \S\ < 6' ■ k and x ^ V 

\\x\\i -2- ||2;||5,i > /3\/A-||j;||2 

where (3 = 6 — 2\/d^. 

Proof: Let S C [k] and |5| < 5' ■ k. Then, by Jensen's inequality, we can say that 

\\x\\s,i < V\S\ ■ Ms,2 < V\S\ ■ M2 
This implies that 

||x||i -2||x||5,i > ||x||i -2^/\S\■ \\x\\2 > {5 - 2VS')Vk\\x\\2 
We get the stated result by putting (5 = 6 — 2\f& . ■ 

Correctness of the LP decoding 

We next state a theorem which roughly says the following : Assume A is a linear map whose range 
is a Euclidean section and all the singular values of A are at least a. Also, let F : (Z"*") — )• M be 
the query corresponding to the linear map defined by A (with k = Q{d)). Then, given noisy values 
of ^ • X where the noise is such that but for some (fixed) positive fraction of the coordinates, all 
other coordinates are within a = o{a) of the correct value, the linear program ([2]) gives x such that 
\\x — x\\i <C d. 

Theorem 5.3 Let A : M — t- M be a full rank linear map (k > d) and all the singular values of 
A are at least a. Further, the range of A (denoted by C{A)) is a {5,d,k) Euclidean section. Let 
F : {Ia'^Y — )• M'^ the query corresponding to A. Then, there exists 7 = 7((5) such that if 

P [\F{x)i -yi\<a]>l--f 
je[fe] 

then, any solution x to the linear program ^ satisfies \\x — x\\i < 0{aykd/a) where the constant 
inside the O(-) notation depends on 5. 
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Proof: First of all, using Claim [5^ we get that for any S Q [k] such that \S\ < 6 k/8 and z G M 

\\A ■ z\\i - 2\\A ■ z\\s,i >{6- 2V6^)Vk\\A ■ zh >^\\A- zh > ^^Iklb (3) 

The last inequality uses that the singular value of A is at least a. Now, assume that x is the 
solution to the linear program 1^. Also, assume that y = A ■ x + e. Set 7 = (^^/8 and then given 
the assumption on the noise, e satisfies 

P [\ei\ >a]< 52/8 

Also, let X = X + z. The next few steps are identical to [DMT07] but we do it here for the sake of 
completeness. Since, x is the solution to the linear program, we get 

||^-a;-y||i > ||A-x-y||i 

ll^-j; — y||i > \\A ■ (x — x) + A ■ X — y\\i 
||e||i > ||A • z — e||i 

Let S = {i : |ej| > a}. Then, from the above, we get that 

l|e||5,i + llell^^i > ||A • 2; - e||5,i + ||A • z - e||^^;^ > Hells',! - \\A ■ z\\s,i + \\A-z- e\\^^^ 

We next use ||^ • z — e||^ ^ > ||^ • -^H^ ^ — ||e||^ ^ on the right hand side of the above inequality to 
simplify and get 

||e||5_i> \\A-z\\s^^-\\e\\s^^-\\A-z\\s,i 

\\A ■ z\\i - 2 • ||A • z\\s,i < 2||e||3;^ 
p-z||i -2- P-z||5,i < 2a|5| 
\\A ■ z\\i - 2\\A ■ z\\s,i < 2a(l - 6'^/8)k 
Combining the above with ([3|), we get that 

„ „ 8a(l-6^/8)Vk „ „ 8a(l-6^/8)Vkd ^ ( aVkd\ 
ll^ll^^ Ta ^ "^"^^ Ta = ^^ 



5.1 Releasing f-way marginals privately 

We now formally define the problem of releasing £-way marginals. The universe X is of the form 
{ — 1, l}'^ and we have a counting query F : (Z+)'^ — t- (M)^ where k = (^) -2^ and d = 2'^ . For any 
X G (Z^^)"', each coordinate of x is identified with an element of {—1, 1}'^ . Note that [d'] can be 
identified with a set of d' binary attributes and any z £ {—1, 1} denotes a specific setting of the 
attributes. Hence, for x £ (Z"'")'^, the row associated with z G {—1, l}'^ counts how many elements 
have that specific setting of the d' attributes. 

Similarly, consider any c G { — 1,0, 1} and define |c| = X^fci |cj| *-e., |c| represents the number of 
non-zero entries in c. We note that the set S defined as 

S = {c:ce {-1, 0, l}'^' and |c| = £} 
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has size 2^ • (^) = k. Thus, every c G S can be identified with some specific setting of a specific 
set of i among the d' attributes. Hence, we identify the set S with [k] and for any y G M , every 
coordinate of y is identified with some specific c £ S. We now define F. Consider a database x and 
a row corresponding to c. Let Z be the set of i attributes and A be their setting corresponding to 
c. Then the row of F{x) corresponding to c simply counts the number of elements in x where the 
attributes in Z are indeed set to A. We now define F more formally. 

Now, for any c G { — 1,0, l}"^ and z G { — 1, l}'^ , we define Fdz) as follows : 



^^i^>-umui^) 



In other words, Fc{z) is 1 iff the following holds for every j : If Cj = 1, then Zj = 1 and if Cj = —1, 
then Zj = —1. 

We now define the counting query F : (Z^) — t- (R) as 

\ze{-i,iY' ze{-i,iY' ze{-i,iY' 

We now state our main theorem of this section. 

Theorem 5.4 Let q,i £ N be constant integers. Then, there exists a constant 7 = j{q,i) > such 
that any mechanism which releases the i-way m,arginals of a table of size n over d! attributes and 
n < d'^^^ logfq) ^ ^y adding at most r] noise to 1 — 7 fraction of the queries where 

rj = o(Vn) 

is attribute non-private. Further, the algorithm which violates attribute privacy is efficient and uses 
LP decoding. 

Here log(„) n is an iterated logarithm which is defined precisely later on. However, we wanted to 
state the main result of this section in the beginning itself before diving into the proof structure. 

In order to go into the proof structure, we first state the following theorem due to Kasiviswanathan 
et al. [KRSUIOJ which is a weaker version of our result. We state their result only for the case 
when d'^~^ ^ n ■ log ~ n because our subsequent improvements are valid (and possible) only in 
the regime of d ^ n. 

Theorem 5.5 JKRSUIO^ Let i £ 'N be a constant and n,(i G N such that d'^^^ > n • log^^~^n. 
Then, for every mechanism M which releases i-way marginals of a database of size n (and universe 
{0, 1}'* ) such that the noise for every single query is bounded by rj where 



in 
77 <C 



is attribute non-private. In fact, the attack is an efficient algorithm based on I2 norm minimization. 

Before, we glimpse into how they prove their result and our improvement on that, we need to 
describe the Hadamard product of matrices. 
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Definition 5.6 Let A\, . . . , vlg G M^»^". Then, the Hadamard product of Ai, . . . ,As is denoted by 
A = Ai o A2 . . . o As S M where L = ii ■ . . . ■ ig and is defined as follows : Every row of A is 
identified with a unique element of [£i] x . . . x [£s]- Also corresponding to i = {ii, . . . ,is), 

s 

A[i,k] = YlAj[ij,k] 

where A[i,k] represents the element in row i and column k. 

The attack in [KRSUIO] is an efficient algoritlini (is simply matrix inversion) and is basically 
dependent on showing existence of a Hadamard product of small matrices such that all its singular 
values are large. In particular, they show the following reduction. 

Lemma 5.7 Let Ai,. . . , Ai^i G M°''^" such that A = Aio A2. . .0 Ae^i (with d'^^^ > n) such that 
all singular values of A are at least a. Further, all entries of Ai, . . . , Ai-i are in the set {0, 1}. 
Then, any mechanism M which adds <C {^/n ■ a) / {V d'^~^) noise to every coordinate while releasing 
all the i-way marginals of a table of size n with d' attributes, is attribute non-private. 

Subsequently, to prove Theorem 15.51 they proved the following lemma. 

Lemma 5.8 Let D ~ M " be a distribution over matrices such that every entry of the matrix is 
an independent and unbiased {0, 1} random variable. Let Ai, . . . ,Ai_i be i.i.d. copies of random 
matrices drawn from the distribution D and A be the Hadamard product of Ai, . . . ,A£_i. Then, 
provided that d S> n-log ~ n, with probability 1 — o(l), the smallest singular value of A denoted 
by CTniA) satisfies 

CTniA) > ^ 

log ^ n 

This lemma clearly sufficed to prove Theorem 15.51 One of the main avenues of improvement in the 
result in [KRSUlO] is that because the attack is matrix inversion (or in other words, £2 distance 
minimization), it is inherently a noise sensitive method. In other words, it is not evident from the 
result in [KRSUlO] that even if 1 of the ^-way marginals is answered with arbitrary noise, if there is 
still an attack which can violate attribute privacy. In this respect, LP decoding seems like a much 
more powerful method and is usually robust even if a fraction of the queries are answered with wild 
noise. In particular, we can combine the technique in proof of Theorem 15.31 and Lemma 15.71 to get 
the following lemma (We do not prove it here because it is a straightforward combination of the 
two techniques). 

Lemma 5.9 Let Ai,. . . ,Ae-i G M"^'^" such that A = Ai o A2 . . . o Ag^i (with d'^~^ > n) and 
all entries of Ai, . . . ,A£_i are in the set {0, 1}. Also, all the singular values of A are at least a 
and the range of A i.e., C{A) is a {6,n,d'^~^) Euclidean section. Then, there exists a constant 
7 = li^) > such that any mechanism which answers at least 1 — 7 fraction of the £-way marginals 

with noise bounded by a is attribute non-private provided — — = o{n) or in other words, 

a = oi^/nG l\fW^) 

To describe the main technical result of Rudelson [Rudllj . we need to define iterated logarithms. 

Definition 5.10 For r G N, we define logj-^) n as follows : 
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• log^\ n = max{log2 n, 1} 

• log(^) n = log(i) {log^r_i) n) 

Theorem 5.11 \Rudll^ Let q,£ & N be constants. Also, let D ~ M"^ ^" be a distribution over 
matrices such that every entry of the matrix is an independent and unbiased {0, 1} random variable. 
Let Ai, . . . ,A£_i be i.i.d. copies of random matrices drawn from the distribution D and A be the 
Hadamard product of Ai, . . . , A^-i. Then, provided that d ^ nlog(„) n, with probability l — o{l), 
the smallest singular value of A denoted by cj„(A) satisfies 

aniA) = n{Vdi^) 

Also, the range of A is a {n,d'^~^,'j{q,£)) Euclidean section for some "/{q,i) > 0. 

Theorem 15.111 and Lemma 15.91 immediate imply our main theorem which we restate here for con- 
venience. 

Theorem 5.12 Let q,i £ 'N be constant integers. Then, there exists a constant 7 = j{q,£) > 
such that any mechanism which releases the i-way marginals of a table of size n over d' attributes 
and n < d log(„) n by adding at most rj noise to 1 — 7 fraction of the queries where 

rj = o{\/n) 

is attribute non-private. Further, the algorithm which violates attribute privacy is efficient and uses 
LP decoding. 

6 Noise lower bounds for blatant non-privacy 

In this section, we prove lower bounds on the noise required to prevent blatant non-privacy while 
answering random counting queries. Dinur and Nissim |DN03| . in their seminal paper, had shown 
that answering O(nlog^n) subset sum queries with o{^/n) noise results in blatant non-privacy. In 
other words, they had proven the following theorem. 

Theorem 6.1 JDNOS] For every n G N, there is a counting query F : {0, 1}*^ — )• M for k = 
0(n log n) and an algorithm A such that if M{x,F) is the answer returned by the mechanism for 
the database x and query F and 

Vi \M{x,F)i-F{x)i\=o{V^) 

then 

A{M{x, F)) = x' : \\x - x'\\i = o{n) 

The algorithm A in the above result is efficient and uses linear programming. Since then, several 
improvements were made to this result including the results in |DMT07] where the same conclusion 
was achieved under the weaker hypothesis that 

P [\M{x, F), - F{x)i\ = o(^)] > 1/2 + 77 
ie[fc] 
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for any rj > 0. While the attack was inefficient, they also showed how to use LP decoding to get 
an efficient attack and they could achieve this under the weaker hypothesis 

P [\M{x,F)i - F{x)i\ = o(V^)] > 0.761 

Further, in the results in |DMT07] . the domain of the database was M" as opposed to {0,1}"". 
However, all these results achieved blatant non-privacy when the size of the universe was same as 
the size of the database namely n. In this section, we consider the modified setting where the size 
of the universe is smaller than size of the database. As the next theorem shows, when the universe 
size is smaller, the noise required to prevent blatant non-privacy is much larger. 

Theorem 6.2 For every n,d £ N and rj > 0, there is an algorithm A and a counting query 
F : (Z^) —7- M with k = 8{d ■ logn)/ry^ such that if M{x,F) is the answer returned by the 
mechanism and 

F[\M{x, F)i - F{x)i\ = o (^j] > 1/2 + r? 

then, 

A{M{x,F)) = X : \x - x\x = o{n) 

Similarly, for every n,d €z N and 9 = 9{n,d) G M+ < n, there is an algorithm B and a counting 
query F : (TL^Y ~^ ^ ^^^^ ^ = 2(ilog n ■ exp ( -^ j such that if M(x, F) is the answer returned 
by the mechanism and 

yi [\M{x,F)i-F{x)i\=o{e)] 

then, 

B{AI{x,F)) = X : ||x — x'||i = o(n) 

Before going ahead with the proof, we remark that while both the algorithms A and i?, as described 
in the proof are inefficient, the former can be made efficient using linear programming (along the 
lines of |DN03t IDMT07] ) . We choose not to do it for the sake if simplicity. 

Proof: The algorithm A is as follows : 

• Enumerate over all x' G (Z^) such that ||x||i < n. 

• Returns' ii]Pi[\M{x' ,F)i - F{x)^\ = a (^)] > 1/2 + r]/ A 

It is clear that the algorithm returns an answer as x' = x satisfies the second condition. Now, 
consider any c > and x' and x which differ by en. Then, choose ri, . . . ,rfc G { — 1, 1}" uniformly 
at random and let us define the linear query F : (Z"*") — )• M as follows : 

d d 

Fix) = (^ Xj ■ rij, . . . , ^ Xj • rkj) 

3=1 3=1 

Now, consider F^{x) — Fe{x') = Yl,j=iix3 ~ ^j) ' ^^j — Ylj=i -^j ' ^tk where z = x — x' with ||z||i > en. 
Then, by Theorem lB.il 3c' = c'{c,r]) such that 



re&{-l,l}'^ 



d , 

^i-3--3)-n^<^, 
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< r//100 



Let us define the set S" = < £ : | Yli=ii^j ~ ^'j) ' ^ij\ — ^ f • Then a Chernoff bound impUes that 

[\S\ > (r/ • k)/i] < 1 - exp{-r]^k/i) (4) 



ri,..;rk 



Putting k = -^ ■ d ■ logn, we get that the probability is strictly bounded by n . Note that the 

total number of pairs x,x' G (Z+)'^ such that ||x||i, ||2;'||i < n is bounded by n'^'^. Thus applying 
the union bound with ([4]), we get that we can fix ri, . . . ,rfc and get F : (Z^) — t- M such that if 
x,x' G (Z+)'^, ||x||i < n, \\x'\\i < n and ||x — x'||i > en, then 



je[fe] 



\F{x)i-F{x%\ > 



(c'n) 
Vd 



Using the hypothesis i.e., 



(77 
72 



we get that for all x' such that ||x — x'||i > en. 



\M{x,F)i-F{x')i\ = o(-^) 



> 1 - ry/4 



> 1/2 + r/ 



<l/2 + r//4 



Thus, when we have the query F described above, the algorithm A will never return x' if Hx — x'||i > 
en and hence the correctness is proven. 

We now come to the second part of the theorem. The algorithm B is described as follows : 

• Enumerate over all x' E (Z^) such that ||x||i < n. 

• Return x' if Vi, [\M{x', F), - F{x)i\ = 0(6) 

Again, first of all, we observe the algorithm B always returns an answer as x' = x satisfies the 
second condition. To prove the second part of the result, we define F : (Z^) — t- M exactly in 
the same way that we did earlier. Now, consider any i £ [k]. Now, consider Fi{x) — F^[x') = 
Yfj=i{xj -x'j)- rij = Yfj=i Zj • rik where 



z = X — x' with ||z||i > en. Then, by Theorem lB.lt 

2d9'^^ 



^•££{-1,1}'' 



This means that putting k = exp 



^{xj - x'^) ■ rij\ >e 



> exp 



n^ 



2de^ 



2d log n 



ri,...,rfce{-l,l}'* 



31 £[k]:\ ^(xj - x'j) -rijl >e 



> 1 - n 



-2d 



As before, by a union bound, we can fix ri, . . . , r^ and get a query F : (Z^)*^ 
any x,x' G (Z^)*^, ||x||i < n, \\x'\\i < n and ||x — x'||i > en, 

3i£[k]: \F{x),-F{x')i\>e-e 



such that for 
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Using the hypothesis i.e., 

Vi [\M{x,F)i-F{x)i\=o{9)] 

Hence, we get that (for all x' such that ||x — x'||i > en) 

3i G [A;] : \M{x, F)i - F{x')i\ = ^{6) 

This shows that algorithm B will never return x' if ||x — x'||i > en and hence the correctness is 
proven. ■ 
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A Construction of databases with large differences 

In this section, we give constructions of sets of vectors in (Z^) (in other words, sets of databases) 
which were used in the results in Section [2j 

Claim A.l There exists a constant C > 1 such that for any dGN, 0<e<l and \ogd < k < 
d/C , there is an integer a > (log(d/fc)/160e) such that it is possible to construct 2 ""^ vectors 
xi, . . . , X2k/2Q £ 1''^ with the following properties : 

• Every entry of Xj is either or a 

• \/i, \\xi\\i < k/SOe and \\xi — Xj\\i > fc/160e 
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Proof: We use construction of combinatorial designs from |Pau851 IRRV99J . Namely, the main 
theorem in |Pau85j states that it is possible to construct sets Si, ... , Sm C [d] with the following 
properties : 

• \/i, \Si\ = £ 

• Vi ^ j, {SiD Sj\ < p 

provided d > C ■ '™ for some large constant C. We now see that if we put £ = [k/log{d/k)\, 

p = i/4 and m = 2''''^^, then indeed d > C ■ —^ — provided C is sufficiently large compared to 
C. 

Now, consider the set A = {yi, . . . , y2k/2o} be the characteristic vectors of the sets Si, ... , S2k/2o. 
Now, we observe that setting Xi = [(log(d/A;)/80e)J • yi achieves all the stated conditions. ■ 

Claim A. 2 For any d,d' eN, d' < d and 1/40 > e > 0, n = d'/{4e), there exists S C Z+'^ such 
that the following conditions hold: 

• \/x £ S, \\x\\i < n 

• Vx, y £ S and x ^ y, \\x — y\\i > n/10 

• \S\> 24'^'/5 

Proof: Consider a set C C {0, l}'^ with the following two properties. 

• |C| > 24'^'/5 

• Vx,y £ C, X ^ y, \\x — y\\i > d' /9 

• Vx G C, llxlli < d' . 

Such a set C exists. To see this, consider an error correction code C" C {0, 1}'^ with distance d' /9 
and rate 4/5. Such a code exists via probabilistic method. Now, the set C is constructed as 

C = {x = yo 0'^-'^' -.yeC'} 

Now, consider the set S G Z^ such that 

S = {x:x= L(l/4e)J ■ z and z e C} 

We claim that the set S satisfies the required conditions. The first and the third parts of claim 
are obvious. Note that for any x,y £ S, \\x - y\\i > (d'/Q) • L(l/4e)J > d'/^AOe) > n/10. The 
penultimate inequality uses that (l/4e) > 10. H 

The above claim worked for e < 1. We now prove a claim which works in the regime of e > 1. 

Claim A. 3 There exists C > such that for any d,d' e N, e > 1 and d' < d/(C-2^^) , n = d'/{4e), 
there exists S C Z"*" such that the following conditions hold: 

• yx £ S, \\x\\i < n 
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• Vx, y £ S and x ^ y, \\x — y\\i > n/10 

• \S\> 24'^'/5 

Further, S is in fact a subset of {0, 1}'^. 

Proof: We observe that the construction of set S is related to the construction of combinatorial 
designs j PauSSj IRRV99J with specific parameters. In particular, the result in |Pau85j allows us to 
construct sets ^i, . . . , 5"^ C [d] with the following properties : 

• Vi, \Si\ < [n\ 

• Vi/ j, \Sir\Sj\ <p< [4n/5j 



provided that d > 



(for some large constant C). Using the conditions on d,d' and n, 



we see that the condition is satisfied provided C is sufficiently large compared to C". Clearly, if 
xi, . . . , Xm are characteristic vectors of the sets S*!, . . . , Sm respectively, then 



b — \X\, . . . , XyriS 



satisfies the conditions of our claim. 



B Large deviation of Rademacher sums from their mean 

In this section, we prove the following inequality which says that Rademacher sums have large 
deviations from their mean with significant probability. 

Theorem B.l Letxi, . . . ,Xrf be i.i.d. ±1 random variables such thatE[xi] = 0. Also, letai, ...,0(1 £ 



such that Yli=i ^i — ^- Then, for any < 6* < n/2, 

d 



Xl,...,Xa 






(Jjj Jbj 



> 



> exp 



-2de^ 



n^ 



An immediate application of the above theorem is the following corollary. 

Corollary B. 2 Let xi,...,X(i be i.i.d. ±1 random variables such that E[j;j] = 0. Also, let 
ai, . . . ,ad G M"*" such that Yl'i=i Oi = n. Then, 



Xl,...,Xi 



E 



yAjf ^ju f 



> 



n 



lG^/d 



> 



9 
10 



To prove Theorem IB.H we use the following result due to Montgomery-Smith |MS90j . For y G W^ 
and i > 0, we define 

K{y,t)= inf {\\y-y'\\^+t\\y'U 

The following theorem about large deviation of Rademacher sums from the mean was proven by 
Montgomery-Smith |MS90] . 
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Theorem B.3 Let xi, 



, Xrf be i.i.d. ±1 random variables such that E[xi] = 0. Then, 



Xi ,...,3;^ 






Vi^i 



>K{y,t) 



> exp(-tV2) 



To use Theorem IB .31 in order to prove the Theorem lB.il we make the following claim. 
Claim B.4 Let y eW^ such that \\y\\i = n. Then, K{y,t) > mm{n/2,nt/{2Vd)}. 

Proof: Note that K{y,t) = infygiRn{||y — y'||i + t||y'||2}- Consider y' which achieves the infimum. 
If III/' II 2 ^ n/{2yd), then we are done. Else, we get that ||y'||i < n/2 (by Cauchy - Schwarz). Then 
by triangle inequality, we get that ||?/ — y'||i > n/2. ■ 

The theorem immediately follows by plugging the lower bound on K{y^ t) from the above claim in 
Theorem IB. 31 

C Concentration of measure for the sum of squares of Gaussians 



In this section, we prove a result about the concentration of measure for the sum of squares of i.i.d. 
AA(0, a) random variables. While this seems to be a well studied distribution in literature, we could 
not find a usable result on its concentration and hence we prove the following theorem here. 

Theorem C.l Let Xi, . . . ,Xk be k i.i.d. J\f{0,a) random variables. Then, 



Xi,...,Xk 

Proof: Note that by definition, for any i £ [k], 

F[X, = t] 



[Xf + X| + . . . + X| > 2(1 + r])ka^] < 2 



-rjk 



1 _j1 , 
e 2,72 dt 



27ro-2 



Then, consider the random variable Zi = exp ( ^^ ] . We note that 



nzi_ 

Now, observe that 



27ro-2 






e 2^ • 64^^ dt 



27rcj2 



j2 

3S^ dt = \/2 



X\,...,Xi, 



[xl + xl + ... + xl>\] 



Xi,...,Xk 



Xi,...,Xfe 



XI + XI + ... + X^ 



4cj2 



> 



exp 



xl + xl + ... + xf 

4(j2 



A 

4^ 
2 

k 



< I E exp 

^Xl,...,X(; 



XI + XI + ... + xl 

4a2 






Using independence of the Xi 's, we get that the above expression is 

nil (lEexp (g)) _ Yll^E[z,] _ 2^V2 



e^P (4^) 



exp (^) exp (^) 
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Putting A = 2(1 + ri)ka'^, we get that the above expression 



IS 



2fc/2 2^/2 2^/2 ^k 

< „ . .. < 2"^ 



e-P (i^) exp (Hii±^) - 2^^ 
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